BlogsContact

Computing

2023-01-19 Using GREP with Perl regex

Use Grep to find an inclusive set of values in any order

I needed to search my blog posts for ones that included all of a number of words in no particular order.

As a Linux user for many years I have generally used the grep utility to find them.

You can match on multiple values but they are 'Or' matches whereas in this case I want posts that contain all of the values but in any order

My normal way of doing this would be pipe from one to another:

grep -ri -e bike | grep -i -e gravel

But I have discovered a better way using the Perl regular expression option in grep.

grep -lri -P -e '(?=.*?bike)(?=.*?gravel)^.*$'

Gives:

2022-12-30 Bike life/page.md
2023-01-11 First bike ride this year/page.md

Which is just what I wanted, the options used are:

-l just print the file name
-i ignore-case
-r recurse though files and directories
-P use Perl regular expression engine
-e pattern or text to match

Now its some years since I regularly used Perl in my work but I still remember the power of the Perl Regular Expression engine and the awe I had for those Ninja's who truly understood how it worked and while I would never pretend that I mastered regular expressions, I am glad of the experience as it has stood me in good stead as I have developed applications in many different languages often using the regex implementation in the particular language I am using at the time.

The magic of the Perl regex: '(?=.?bike)(?=.?gravel)^.*$'

This uses lookaheads to find all of the matches in any order.

Lookaheads are zero-length assertions and so do not move on through the search text, these are looking ahead, first for the word 'gravel', then for the word 'bike' but because the current search position does not move from the start then every search is effectively from the beginning of the text and so they will match on any order.

Look at this Regex page on Mastering Lookahead and Lookbehind for a better explanation.

Also try Regex test site which allows you to build, test, and debug regular expressions

In the match spec: (?=.*?gravel) we have a Lookahead indicated by the ?= that asserts that what immediately follows the current position in the string is .*?gravel where the . represents any character (other than terminator character) and *? is a lazy match ( see Link below for more information on lazy quantifiers ) of the previous token (any character) between 0 and unlimited times but as few times as possible until it matches the characters 'gravel'

See an explanation of lazy quantifiers

© Jeremy Smith